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Abstract 

Background: DNA methylation profiles differ among disease types and, therefore, can be used in disease diagnosis. 
In addition, large-scale whole genome DNA methylation data offer tremendous potential in understanding the role 
of DNA methylation in normal development and function. However, due to the unique feature of the methylation 
data, powerful and robust statistical methods are very limited in this area. 

Results: In this paper, we proposed and examined a new statistical method to detect differentially methylated loci 
for case control designs that is fully nonparametric and does not depend on any assumption for the underlying 
distribution of the data. Moreover, the proposed method adjusts for the age effect that has been shown to be 
highly correlated with DNA methylation profiles. Using simulation studies and a real data application, we have 
demonstrated the advantages of our method over existing commonly used methods. 

Conclusions: Compared to existing methods, our method improved the detection power for differentially 
methylated loci for case control designs and controlled the type I error well. Its applications are not limited to 
methylation data; it can be extended to many other case-control studies. 
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Background 

Essential for normal development and an influence on a 
variety of processes related to DNA integrity and func- 
tion, DNA methylation plays an important role for epi- 
genetic gene regulation in both development and disease 
[1]. It is associated with processes including genomic 
imprinting, X-chromosome inactivation, suppression of 
repetitive elements, and carcinogenesis [2-4]. Aberrant 
DNA methylation, such as hypomethylation of onco- 
genes and hypermethylation of tumor suppressor genes, 
leads to genomic instability and tumorigenesis [5-10]. 

With the development of high-throughput platforms, 
genome-wide analysis of large-scale DNA methylation 
patterns and their impacts on gene regulation have re- 
ceived a significant amount of attention. We are propos- 
ing an age-adjusted nonparametric method to detect 
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differentially methylated loci that can account for age ef- 
fects that has advantages over existing methods the limi- 
tations of which we explain next. Typically, methylation 
levels in Illumina methylation assays are quantified in 
terms of the /?-value calculated from the intensity ratio 
of methylated (M) to unmethylated (U) alleles. Specific- 



ally, ,8 



max(M.O) 



where M and U are the in- 



max(Mfi)+max(Ufi) + WO ' 

tensities of red and green dyes, respectively, for the 
GoldenGate and VeraCode Methylation assays, or the 
signals A and B, respectively, for the Illumina assay. The 
striking feature of the ^-values is that they are continu- 
ous and range from 0 (totally unmethylated) to 1 (fully 
methylated). 

With more and more DNA methylation data generated 
from the high-throughput DNA methylation platforms, 
powerful and efficient statistical methods to handle these 
complex data are becoming highly demanded. One of 
the important research topics in this area is to detect 
differentially methylated loci in case and control studies. 
The commonly used methods for this purpose are the 
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Student's f-test and linear regression. Recently, a number 
of model-based approaches have been proposed in the lit- 
erature. Siegmund [11] introduced a Bernoulli-lognormal 
mixture model to perform clustering analysis on methyla- 
tion data generated using MethyLight. Houseman [12] 
proposed a P-mixture model to classify different tissue 
types using a recursive-partitioning algorithm for high- 
dimensional data from Illumina arrays. Wang [13] devel- 
oped a likelihood based uniform-normal-mixture model 
to select differentially methylated loci between case and 
control groups using the Illumina array. The basic idea 
of Wang [13] is to describe the data using a three- 
component mixture model and treat completed methyl- 
ated, unmethylated, and partially methylated loci differ- 
ently. Wang [13] showed that, compared to the Student's 
f-test under some situations, their method increases de- 
tection power [13]. However, the aforementioned methods 
assume that the methylation profiles follow some special 
distributions that are known in terms of a finite number 
of parameters. Obviously if the underlying true distribu- 
tion is far from the proposed ones, such assumptions will 
lead to biased results in practice. 

Another complexity of the methylation study comes 
from the presence of other potential confounders such 
as age effects. As shown in [14-17], age is by far the 
strongest demographic risk factor for cancer, and there 
is substantial evidence that aging affects DNA methyla- 
tion of specific loci, including cancer-related genes. 
Based on these observations, it is necessary to adjust age 
effects in the analysis of detecting differentially methyl- 
ated loci. If the relationship between the methylation 
and age is more complex than a linear one, a traditional 
linear regression leads to inaccurate results. To solve this 
problem, Chen [16] proposed a method by first dividing 
the samples into several age groups and then combined 
the p-values obtained from each individual group to- 
gether to form a new test. This method has been shown 
to increase the power in contrast to the traditional i-test 
without age adjustment or regression model with age 
adjusted linearly. However, within each group, a p-value 
is obtained from a simple t-test that is less robust and 
thus leaves room for improvement. 

In this paper, we propose and examine a novel means to 
detecting differentially methylated loci and, that is, an age- 
adjusted nonparametric method that can account for age 
effects, given that substantial evidence exists that aging 
affects DNA methylation of specific loci, including cancer- 
related genes. Our method stems from the rank-based 
nonparametric methods that focus on a comparison of two 
entire empirical distribution functions rather than only two 
means. More specifically, we first divide the subjects into 
several age groups; then for each group, a nonparametric 
test is performed on each locus, and an individual p-value 
is reported. An overall p-value for each locus is estimated 



through combining all the individual p-values computed 
previously for that locus in different age groups. Our 
method does not depend on any distribution assumption 
but rather adjusts for age effects in an efficient way. We 
demonstrate the powerfulness of our method using both 
simulated and real datasets. 

Methods 

Assume all the subjects are from K different age groups. 
In the /c th group, k = 1,. . .,K, there are njk control sub- 
jects and n 2 k case subjects. For each DNA methylation 
marker, let y^ t i = 1,. . .,nj h j = 1,2, k= 1,. . .,K, denote the 
observation of p-value for the i th subject in j treatment 
(1 for control and 2 for case) and /c th age group. To ad- 
just the age influence on the methylation level, the linear 
model takes the form 

Jijk = a* treatment j + bk*agek + e^, 
fori = 1, . . . , n jk ,j = 1, 2,k = 1, . . . ,K, 

Where fl y and bk are regression coefficients and fol- 
lows a i.i.d normal distribution. The normality assump- 
tion is clearly invalid for the p-values which have limited 
range between 0 and 1. Moreover, the relationship be- 
tween the (3-value and age is likely to be more compli- 
cated than a linear one. Therefore more powerful and 
robust nonparametric methods are desirable. 

Here we propose a new nonparametric method which 
does not depend on any distribution assumption and 
meanwhile allows for the adjustment of covariates. We 
process as follows. For each age group k (k = 1,. . .,K), 
we test the difference between the control group y llk 
(i = 1,. . .,n lk ) and the case group y i2 k (i = 1,- ■ -,n 2 k)- Our 
goal is to test whether or not the two methylation 
groups follow the same distribution. Toward this end, 
the Wilcoxon rank sum test is a useful tool when 
there are reasons to believe that the outcome variables 
of interest may fail certain distributional assumptions 
required for parametric methods. However, as dis- 
cussed in Baumgartner [18], Wilcoxon rank sum test 
is not suitable for situations where the expected values 
of the two populations are close to each other. To 
overcome this problem, they proposed a more power- 
ful nonparametric test to handle the general two-sided 
two-sample problem [18]. Neuhaeuser further extend- 
ed the two-sided two-sample test to a one-sided test 
that can detect if one population is stochastically lar- 
ger than the other [19]. Our p-value calculation for 
each age group is based on Neuhaeuser's one-sided 
test whose statistics can be explicitly formulated as 
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Here G„ i = l,2,..., n lk and Hpj = 1,2,. . .,n 2 k are the 
ranks of the samples from the k th case and control groups, 
respectively. Due to the intractable asymptotic distribution 
for the test statistics B, it is hard to find a close form for 
the relationship between p-value and B. We will use nu- 
merical fit to approximate this relationship. We first obtain 
the empirical distribution of B based on 10 permutations 
and then fit the distribution function piecewise exponen- 
tially to obtain the empirical relationship. The p-value for 
the k th age group can be calculated according to this em- 
pirical formula. 

As a consequence, we have K p-values from the left-sided 
test, denoted by p [k (k= 1,. . .,K), and K p-values from the 
right-sided test, denoted by p rk = 1-pik (k = 1,. . .,K). Then 
combining K left-sided p-values together gives a statistic 



T, = -2^2 log(p !k ) 



k-l 



Similarly, combining K right-sided p-values together 
gives a statistic 



T r = -2^2 l °g(Prk) 
k=l 

Under the null hypothesis of no difference between 
the two treatment groups, pi k and p rk are uniform [0,1] 
random variables for k= 1,. . .,K. Therefore, according to 
Fisher [20], both T[ and T r will follow a chi-square distri- 
bution with degree of freedom 2 K. We define a new 
variable: 

T = max{Ti, T r }, 

then we have [21] 

2a — a 2 < Pr(X > x)< 2a, where a 

= 1 — F^2 (x), and F x 2 is the CDF of xlic 

Thus, for small a, we can approximate the p-value of 
T by its upper bound 2a as 

Pr(7 > x)~2a 

More details can be found in [21-23]. For large a, we 
will fit a formula empirically through permutation. We 



call the above proposed method "combined Baumgartner- 
Weifi-Schindler (BWS) test". 

Results and discussion 

Empirical fit of the p-value formula for one-sided BWS test 

The asymptotic distribution function of B is complex 
and in practice permutation method is often used. The 
permutation results depend on the sample size. But as 
mentioned in Baumgartner [18], for a two-sided test, the 
asymptotic distribution can be approximated by the per- 
mutation method quite well even with a small sample 
size (as small as 10). We first derive the empirical formula 
to fit the asymptotic distribution using the permutation 
method. Toward this end, we sample two subpopulations 
from the same distribution (e.g. standard normal), each of 
which has a sample size of 30. Then, the whole popula- 
tions are permuted 10 times, and a one-sided BWS test 
statistic B is calculated for each permuted sample. Then 
we fit the empirical cumulative distribution of B using a 
piecewise exponential function to arrive at the following 
empirical formula 



P(B) 



r e -0.699-1.255*5 (()<£< L5 ) 

e -0.895-1.153*B+0.0173*B^ 1 _ 5 < B < an d 



P(-B) 



, B (B > 9) 



l-P(B). 



The node points we used here are 1.5 and 9. We find 
that the choice of node points has very little influence 
on the final analysis results for both simulated and real 
data. Note that the sample size we used for deriving this 
formula is 30. We have also tried some other choices 
and found that the results are quite similar. Thus, we 
recommend the above formula to be used in practice for 
problems with a sample size larger than 10. 

Simulation results 

The first simulation settings are for the evaluation of the 
type I error rate for the proposed method. For the pur- 
poses of comparison, we also include the results from 
the combined f-test proposed in [16], linear regression 
and combined Wilcoxon methods for all simulated and 
real datasets. We assume that there are 6 age groups, 
and each group includes 100 subjects, 50 controls and 
50 cases. For each age group, we also assume the [3- 
values follow a three-component mixture distribution as 
in Wang [13]. Let Ti and x 2 be the two threshold values. 
The first and the second components are uniform dis- 
tributions £/[o,ji] and ^j r2 i]- The third component is a 

truncated normal distribution Nr Tlir2 i (ft, a 2 ) . The prob- 
abilities for a measurement to fall into these three com- 
ponents are Tii, Ji2, JI3 respectively. Under the null 
hypothesis, the two treatment groups are sampled from 
the same distribution. The mean of the truncated 
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Table 1 Estimated Type I error rates at significant level 0.05 based on the four methods under different parameter 
settings of the three-component mixture distributions with t, = 0.1, t 2 = 0.9 and 6^ = 0.05 



Parameter Setting 




"2 


"3 




a 


f-test 


Linear regression 


Wilcox 


BWS 


1 


0.3 


0.5 


0.2 


0.3 


0.1 


0.0513 


0.0521 


0.0514 


0.0458 


2 


0.4 


0.5 


0.1 


0.2 


0.1 


0.0495 


0.0494 


0.0519 


0.0492 


3 


0.4 


0.5 


0.1 


0.3 


0.2 


0.0511 


0.0519 


0.0503 


0.0495 


4 


0.5 


0.1 


0.4 


0.3 


0.2 


0.0528 


0.0521 


0.0544 


0.0511 


5 


0.4 


0.2 


0.4 


0.2 


0.1 


0.0509 


0.0510 


0.0472 


0.0464 



normal distribution is taken to be ^ + /c*f5 fi for the k l age 
group. The simulation is repeated 10000 times. Table 1 
lists the proportion of times that null is rejected using 
the four different methods under different parameter 
settings. Table 1 shows that the nominal type I error rate 
of 0.05 is well controlled by all methods. 

The second simulation settings are for assessing the 
power of the proposed method under alternative hy- 
pothesis, i.e. the case and control subjects are sampled 
from different distributions. We still assume that the p 1 - 
values follow the similar three-component mixture dis- 
tributions. Two scenarios are considered here. In the 
first scenario, we use different means for different treat- 
ment groups. More specifically, we let the truncated nor- 
mal mean for the control sample be constant fi, but for 
the case sample vary as fi + k* 8^ for the k th age group. 
We replicate the simulation 10000 times for each scenario. 
The power is defined as the proportion of times that the 
p-value is less than 0.05. The first two rows in Table 2 list 
the results for this scenario. In the second scenario, we 
let the two threshold values vary as tj + kS T and T2-k8 T for 
the /c th case group but keep them constants as and 
for all control groups. The results are listed in the last two 
rows of Table 2. For the first scenario, the mean values are 
different for the case and control groups. As expected, all 
four methods have increasing powers as the signal in- 
creases. For the second scenario, the expected values are 
the same, but the variances are different between the two 
treatment groups. In this situation, our proposed method 
is more powerful in detecting the difference than the other 
three methods. Therefore, the proposed method can de- 
tect not only the location difference but also the scale dif- 
ference between the two distributions. 

The third settings assume that the [3-values for both the 
case and control subjects follow the beta-distributions. Let 



Si = S2 = 4. For the case group, the (3-values are sampled 
from a beta-distribution Beta(s\ + 8, s 2 - 8). For the control 
group, the [3-values are sampled from a beta-distribution 
Beta(5si + 8, 5s2 - 8). Here 8 takes six different values, -3X, 
-2X, -A, 0.SX, X, and l.SX, one for each age group. 

Based on the above setting, it can be easily shown that 
the mean of the distribution for the case group is 
while the mean of the distribution for the control group 
is s i + f' 5 The mean difference between the two treatment 

Si+S 2 

groups is which will increase with 8. Table 3 lists 
the empirical powers of four methods for different X 
values. In all situations, the most powerful method is 
combined DWS. Since the variance of two distributions 
are different, even in situation where 8 = 0, the power of 
the proposed method can still reach 0.67, whereas the 
other three methods have no power at all. As A de- 
creases, the mean differences become smaller and make 
it more difficult to distinguish between the two treat- 
ment groups. As expected, the powers decrease for all 
methods as A decreases. For small A, the power from the 
combined DWS method is much bigger than those from 
the other three methods. The performances of the com- 
bined t-test and combined Wilcoxon methods are quite 
similar, and both are much better than the linear regres- 
sion method. 



Real data results 

We also applied our proposed method to the United 
Kingdom Ovarian Cancer Population Study (UKOPS) 
[15] to select differentially methylated loci between 
ovarian cancer cases and healthy controls. The data were 
created using Illumina Infinium Human Methylation27 
Beadchip and downloaded from the NCBI Gene 



Table 2 Estimated powers of the four methods at significant level 0.05 under different parameter settings for the 
three-component mixture distributions with r, = 0.1, t 2 = 0.9 

Parameter Setting tt, n 2 n 3 y a t-test Linear regression Wilcox BWS 

6^ = 0.03 0.3 0.5 0.2 0.3 0.1 0.475 0.479 0.749 0.836 

6^ = 0.05 0.3 0.5 0.2 0.3 0.1 0.889 0.892 0.951 0.988 

<5 r = 0.03 0.45 0.1 0.45 0.5 0.05 0.048 0.047 0.078 0.727 

<5 r = 0.06 0.45 0.1 0.45 0.5 0.05 0.048 0.047 0.092 0.877 
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Table 3 Change of the power with A for four different 
methods when the distributions are Betais, +6, s 2 - S) 
and Beta{5 s-,-6, 5s 2 + 6) for the case and control 
groups; 8 takes the values of -3X, -2X, -X, O.SX, X, and 1.5X 
for the six age groups and s n = s 2 = 4 





Combined 
f-test 


Linear 
regression 


Combined 
Wilcoxon 


Combined 
BWS 


0 


0.056 


0.053 


0.073 


0.669 


0.05 


0.075 


0.057 


0.094 


0.703 


0.1 


0.182 


0.087 


0.200 


0.832 


0.15 


0.402 


0.141 


0.412 


0.939 


0.2 


0.701 


0.212 


0.687 


0.988 


0.25 


0.911 


0.298 


0.893 


0.999 



Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) 
under the accession number GSE19711. There were 274 
control samples and 131 pre-treatment case samples, 
and the number of loci was 27578. For the data quality 
control, we removed 29 patients (15 controls and 14 
treatments) with BS conversion efficiency value < 4000 
or coverage rate < 95% [15]. For each treatment group, 
the samples were divided into 6 age groups (50-55, 55- 
60, 60-65, 65-70, 70-75, and 75 and over). We further 
removed 12 patients in the pre-treatment group whose 
ages exceeded 78 since there were no such patients in 
the control group. The final sample size for each indi- 
vidual group is the same as the one listed in Table 2 of 
Chen [16] except that the "75 and over" group has 13 
pre-treatment samples instead of 25. This dataset was 
analyzed by both Wang [13] and Chen [16] in their pa- 
pers. Wang [13] did not consider the age effect, and only 
96 cases and 136 controls were included in their analysis 



after further screening; while Chen [16] included the 12 pa- 
tients with ages exceeding 78; thus their results are different 
from ours even though the same method was used. 

Figure 1 shows the scatter plots for each locus based 
on the negative loglO p- values derived from four dif- 
ferent methods. Figure 1 (a) plots the results for the 
combined DWS test and the linear regression method, 
Figure 1 (b) plots the results for the combined DWS test 
and combined t-test, Figure 1 (c) plots the results for the 
combined DWS test and combined Wilcoxon-test. From 
Figure 1 (a), it can be seen that most of the loci with 
small p-value in the linear regression tend to have small 
p-value in our proposed method as well. However, there 
exist many loci whose p-values are large in the linear re- 
gression but small and significant in the combined DWS 
test, i.e., those points in Figure 1 (a) with x-values close 
to zero but y-values large than 3. This indicated that our 
proposed method is more powerful than the linear regres- 
sion method in detecting the differentially methylated loci. 
Similar conclusions can be drawn from Figure 1 (b) and 
Figure 1 (c) for the comparison with the combined i-test 
and combined Wilconxon test respectively. 

Table 4 lists the number of the loci detected by each 
of the four tests based on four different significant levels, 
10~ 3 , 10" 4 , 10" 5 , 10' 6 . Clearly at the same significant level, 
in terms of the number of significant loci detected, the 
most powerful method is combined DWS, the next one 
is combined Wilcoxon test, and then combined i-test 
and linear regression. Table 4 also reports the numbers 
of significant loci obtained by pairs of the proposed test 
and each of the other three methods for given significance 
levels. For example, when the cutoff p-value is 10~ 3 , the 
combined BWS, the linear regression, the combined t-test, 
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Figure 1 Scatter plots for negative Iog10 p-values based on different methods. The left panel is for the comparison between combined 
DWS and linear regression. The middle panel is for the comparison between combined DWS and combined t-test. The right panel is for the 
comparison between combined DWS and combined Wilcoxon test. 
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Table 4 Number of loci with p-values less than the given cutoff significance levels from different methods 



Cutoff p-value 


Linear 


Combined 


Combined Wilcoxon 


Combined BWS 


From both 


From both 


From both 




regression (1) 


t-test (II) 


test (III) 


test (IV) 


1 and IV 


II and IV 


III and IV 


10- 3 


2038 


2754 


3143 


3387 


1884 


2659 


3081 


10- 4 


1438 


1879 


2152 


2321 


1352 


1795 


2117 


10- 5 


1120 


1343 


1495 


1653 


1059 


1286 


1479 



10~ 6 894 982 1109 1222 844 931 1099 



and the combined Wilcoxon obtained 3387, 2038, 2754, 
and 3143 significant loci, respectively. However, among 
those 3387 loci that have p-values less than 10 3 using the 
combined BWS method, there were 1884, 2659, and 3081 
loci overlapping with the linear regression, the combined 
f-test, and the combined Wilcoxon methods, respectively. 
In other words, there were 1503, 728, and 306 significant 
loci were only obtained by the proposed test, but not by 
the linear regression, the combined t-test, and the com- 
bined Wilcoxon, respectively. In contrast, at the same sig- 
nificance level 10 3 , there were only 154, 95, and 62 loci 
whose p-values from the new method are larger than 10~ 3 
but less than 10' from the linear regression, the combined 
f-test, and the combined Wilcoxon, respectively. There- 
fore most of the loci detected by the other three methods 
are also detected by the proposed method. However, 
for the same cutoff level, there were many loci that were 
significant in the proposed method but not in the other 
three methods, a point that clearly demonstrated the 
advantages of our method over those three. 

Discussion 

To study whether or not the proposed method can con- 
trol type I error rate as well, we created pseudo case and 
control samples. The way we did this was to first ran- 
domly divide the original control subjects into two parts 



for each age group. Then we put one part into the new 
pseudo-control group and the other one into the new 
pseudo-case group. The distribution of p-values from 
applying the proposed method to this new case-control 
data set is shown in Figure 2 (a). It is very close to uni- 
form distribution, and this finding is further confirmed 
by the qq-plot against the uniform [0,1] distribution as 
illustrated in Figure 2(b). Therefore, our method in- 
creased the detection power while it simultaneously con- 
trolled the type I error rate. 

In this paper we chose different cutoff p-values to 
compare the performance of the proposed test with 
others. We did not consider the multiple testing issue, 
which is an important but difficult task for this area [4] 
and other areas where a large number of correlated vari- 
ables are tested simultaneously [24-28]. The traditional cor- 
rection methods for multiple comparisons, such as 
Bonferroni correction, are inappropriate here as they are 
too conservative due to the fact that many loci from the 
same gene are highly positively correlated. To account for 
the correlations among loci, methods based on the concept 
of "effective number" may be adopted [29]. 

There are many ways to combine p-values from inde- 
pendent tests [20,30-32]. In this paper, we chose to use 
Fisher test due to its robustness. It is possible, however, 
that other methods may be more powerful under some 



o.o 



- 1 - 

0.2 



T" 



£ ° 



1.0 




0.4 0.6 0.8 

Combined DWS p-value Combined DWS p-value 

Figure 2 Test the distribution of p-values by applying the proposed method to a newly created case-control data based on the 
samples from the original control group. The left panel is for the histogram and the right is a qq-plot against the uniform [0,1] distribution. 
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certain conditions. The combined p-value method used 
in this paper is based on the assumption that the test for 
every individual age group is independent from each 
other. However, if this assumption is questionable, the 
current combined p-value method needs to be modified 
such that it can handle the correlations among the indi- 
vidual tests as well. This is another research topic we 
will pursue in a future paper. 

Conclusions 

In case-control methylation studies, the underlying distri- 
bution of the p-values is rarely known in advance, and 
clearly the normality assumption is not valid. Parametric 
models can be powerful if the assumptions for the models 
are valid, but they can also lead to biased results if the 
underlying true distribution is far deviated from the im- 
posed ones. Thus, it is desirable to choose a powerful yet 
robust statistical test that does not depend on any under- 
lying distribution assumption. In this article we proposed 
and examined a rank-based nonparametric method to de- 
tect differentially methylated loci. Through simulation, we 
showed that our proposed method is as powerful as the 
linear regression, t-test and Wilcoxon rank sum test 
methods if the mean differences between the two treat- 
ment groups are large. However, our method substantially 
outperformed the others in situations where the mean dif- 
ferences between the two groups were small while the 
variance differences were large. 

Note that the age effects are strongly associated with 
methylation, and the ignoring age effect will cause a loss 
of power or a large number of false positives. Another 
advantage of the proposed method over many existed 
ones is that it combined the nonparametric test together 
with age adjustment. Our next goal was to generalize 
our method to adjust for more confounders other than 
the age such that it can have a much wider application 
in methylation research. 
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